Indexing Threads withing Grid and blocks

We have seen how to get the thread ID for threads in a single block; just by using index=threadIdx.x; which is sufficient for single block.

But that index is not unique withing the grid if grid has multiple blocks.

For example:

blockIdx.x=0
threadIdx.x=0 threadIdx.x=1 threadIdx.x=2 threadIdx.x=3
blockIdx.x=1
threadIdx.x=0 threadIdx.x=1 threadIdx.x=2 threadIdx.x=3
blockIdx.x=2
threadIdx.x=0 threadIdx.x=1 threadIdx.x=2 threadIdx.x=3
blockIdx.x=3
threadIdx.x=0 threadIdx.x=1 threadIdx.x=2 threadIdx.x=3

To get the unique index of threads, we have to make use of blockIdx.x variables.
Just like the indices of threads can be accessed within a block,

  1. threadIdx.x
  2. threadIDx.y
  3. threadIDx.z

we have, indices for blocks within a grid give by these variables.

  1. blockIdx.x
  2. blockIDx.y
  3. blockIDx.z

To summarise: we have these following variables(size for grid and blocks are given by variables named as gridDim and blockDim, and index of blocks and threads within grind and blocks are given by blockIdx.x and threadIdx.x).

  1. dim3 gridDim;
    • int gridDim.x;= number of blocks in x
    • int gridDim.y;= number of blocks in y
    • int gridDim.z;= number of blocks in z
  2. dim3 blockDim;
    • int blockDim.x; : number of threads in x
    • int blockDim.y;: number of threads in y
    • int blockDim.z;: number of threads in z
  3. dim3 blockIDx; (for index of blocks within a grid)
    • int blockIdx.x;: index of block in x direction
    • int blockIDx.y;: index of block in y direction
    • int blockIDx.z;: index of block in z direction
  4. dim3 threadIDx; (for index of threads within a block)
    • int threadIdx.x; : index of threads in x direction
    • int threadIDx.y; : index of threads in y direction
    • int threadIDx.z; : index of threads in z direction

These values are set as configuration parameter before the launch of the kernel.

Example 1

Let us say, we have two blocks with 4 threads each. To get the unique index of all threads we should use :
index = threadIdx.x + blockIdx.x * blokDim.x;
where blockDim.x tells the size of the blocks.

threadIdx.x blockIdx.x . blockDim.x index
0 0.4 0
1 0.4 1
2 0.4 2
3 0.4 3
0 1.4 4
1 1.4 5
2 1.4 6
3 1.4 7

Code



#include<iostream>
#include <stdio.h>
__global__ void threadID()
{
	int threadindex = threadIdx.x;
	printf("Executing thread with index =%d withing the block \n", threadindex);
}
__global__ void blockID()
{
	int	blockindex = blockIdx.x;
	printf("Executing thread within the block ID %d \n",blockindex );
}

__global__ void unique_index()
{
	int index = threadIdx.x+blockIdx.x*blockDim.x;
	printf("Executing thread with index  %d \n", index);
}

int main()
{
	threadID<<<3,4>>>();
	cudaDeviceSynchronize();
	std::cout<<std::endl;


	blockID<<<3,4>>>();
	cudaDeviceSynchronize();
	std::cout<<std::endl;


	unique_index<<<3,4>>>();
	cudaDeviceSynchronize();
	return 0;
}


Output

Executing thread with index =0 withing the block
Executing thread with index =1 withing the block
Executing thread with index =2 withing the block
Executing thread with index =3 withing the block
Executing thread with index =0 withing the block
Executing thread with index =1 withing the block
Executing thread with index =2 withing the block
Executing thread with index =3 withing the block
Executing thread with index =0 withing the block
Executing thread with index =1 withing the block
Executing thread with index =2 withing the block
Executing thread with index =3 withing the block

Executing thread within the block ID 0
Executing thread within the block ID 0
Executing thread within the block ID 0
Executing thread within the block ID 0
Executing thread within the block ID 1
Executing thread within the block ID 1
Executing thread within the block ID 1
Executing thread within the block ID 1
Executing thread within the block ID 2
Executing thread within the block ID 2
Executing thread within the block ID 2
Executing thread within the block ID 2

Executing thread with index 4
Executing thread with index 5
Executing thread with index 6
Executing thread with index 7
Executing thread with index 0
Executing thread with index 1
Executing thread with index 2
Executing thread with index 3
Executing thread with index 8
Executing thread with index 9
Executing thread with index 10
Executing thread with index 11

Remarks

You might wonder why is that the threads are not executing simultaneously?